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Abstract —Since their invention, polar codes have received a lot 
of attention because of their capacity-achieving performance and 
low encoding and decoding complexity. Successive cancellation 
decoding (SCD) and belief propagation decoding (BPD) are two 
approaches for decoding polar codes. SCD is able to achieve 
good error-correcting performance and is less computationally 
expensive as compared to BPD. However SCD suffers from long 
latency due to the serial nature of the successive cancellation 
algorithm. BPD is parallel in nature and hence is more attractive 
for low latency applications. However, since it is iterative, the 
required latency and energy dissipation increases linearly with 
the number of iterations. In this work, we borrow the idea of SCD 
and propose a novel scheme based on sub-factor-graph freezing to 
reduce the average number of computations as well as the average 
number of iterations required by BPD, which directly translates 
into lower latency and energy dissipation. Simulation results show 
that the proposed scheme has no performance degradation and 
achieves significant reduction in computation complexity over the 
existing methods. 

Index Terms —Belief propagation decoding (BPD); successive 
cancellation decoding (SCD); energy efficiency; iterative de¬ 
coders; factor graph; polar codes 

L Introduction 

Shanon proved existence of maximum data transmission 
rate, called channel capacity m . Since then, different capacity- 
approaching codes have been designed, like Turbo codes IH 
and LDPC codes 0. The first provable capacity-achieving 
codes, polar codes, were recently invented by Arikan ||4l. 
Polar codes are considered to be a major breakthrough in 
coding theory, since they are the first family of codes known 
to achieve channel capacity with explicit construction. Besides 
achieving the capacity for binary-input symmetric memoryless 
channels O, polar codes were also proved in Q to be able to 
achieve the capacity for any discrete and continuous memo¬ 
ryless channel. Moreover, an explicit construction method for 
polar codes was provided and it was shown that they can be 
efficiently encoded and decoded with complexity 0{nlogn), 
where n is the code length. Since then, polar codes have 
become one of the most popular topics in information theory 
and have attracted a lot of attention. 

Several decoding methods are available for decoding polar 
codes S-ED, SCD and its variants and BPD are two popular 
methods. SC decoders suffer from long latency due to the 
serial nature of the SC algorithm. However, the SC algorithm 
requires less computation as compared to BPD. Based on this 
property, several high-throughput low-cost SC decoders were 


reported in IfTl- lfTTIl . Another advantage of the SC algorithm 
is its ability to achieve good error-correcting performance for 
long code lengths. Eor short code length, based on the SCD, 
the list-decoding or stack decoding method also achieve good 
error-correcting performance lfT^ - llT5]| . 

On the other hand, polar BP decoders |[T^ - ll2T1l have the 
intrinsic advantage of parallel processing. Therefore, compared 
with their SC counterparts, polar BP decoders are more 
attractive for low-latency applications. Eor iterative decoders 
(such as polar BP decoders), the required latency and energy 
dissipation increase linearly with the number of iterations. 
However, the need for a large number of iterations makes 
BP decoders suffer from high computation complexity, and 
hence polar BP decoders are still not as attractive as their SC 
counterparts. To this end, another decoding method, called 
soft cancellation (SCAN) decoding, is proposed in ll22l . By 
restricting the soft information propagation schedule in the 
decoding process, the computational complexity of SCAN is 
much lower than that of BPD. However, different from BPD, 
the SCAN operation is serial in nature, leading to a much 
longer decoding latency. Hence, aiming at the low-latency 
polar codes decoder, we concentrate on the BPD in this work. 

To address the issues of the large number of iterations and 
high computation complexity inherent in BP decoders, Yuan 
et al. 1^ proposed a G-matrix-based early stopping scheme, 
which is based on the fact that iterative decoders normally 
converge earlier than reaching a fixed maximum number of 
iterations. The G-matrix-based stopping criterion can then be 
used to stop the computation if convergence has been reached. 
To further reduce the computation complexity, in this paper, we 
propose a method based on the convergence of the sub-factor- 
graphs, which is reached at a much earlier stage. Borrowing 
the idea from SCD, some of the sub-factor-graphs are checked 
during each iteration and if they have converged, they are 
frozen and do not need to be computed in the subsequent 
iterations. Also the freezing of these sub-factor-graphs will 
help to improve the convergence of the decoding process 
over rest of the factor graph. As a result, the computation 
complexity and also the average number of iterations are 
reduced. Experimental results show that our proposed method 
results in about 40 ~ 46 % lower computation complexity, 
as well as lower latency, when compared to the previously 
proposed early stopping scheme 1^ . 
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Fig. 1. Encoding signal flow graph of (8.4) polar code 
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Fig. 2. (a) Factor graph of (8, 4) polar code, (b) Processing Element for 

BPD 


Notations 

In this paper, the following notation conventions are used. 
Matrices are denoted in boldface capital letters, and vectors 
in boldface lowercase letters. The subscript m of a. matrix 
represents an MXM square matrix and vm denotes an MXl 
vector. x[i] stands for the element of vector x, stands 
for vector x at the iteration and X(^a:b) represents the sub¬ 
vector of X with the starting and ending index of a and b. The 
transpose of a vector x is denoted by x^. 

II. POLAR CODES OVERVIEW 

Polar codes are based on the phenomenon of “channel 
polarization”. More precisely, by recursively combining and 
splitting individual channels, some of these channels become 
essentially error-free, while others become completely noisy. 


Eurthermore, the fraction of the noiseless channels tends 
towards the capacity of the underlying binary symmetric 
channels i]. Therefore, an (n, /c) polar code can be generated 
in two steps. Eirst, an n-bit message u is constructed by 
assigning the k reliable and {n — k) unreliable positions 
as information bits and “0” bits, respectively. The {n — k) 
unreliable positions, which are forced to 0, are called the 
frozen bits (also known as the frozen set A^). Then, the n 
-bit u is multiplied with the generator matrix G = to 

generate an n -bit transmitted codeword x, where is the 

and m = log 2 n. Eig. 


mth Kronecker power of F = 


1 shows the encoding signal flow graph forn = 8 polar codes, 
where the “ 0 ” sign represents the XOR operation. 


A. Belief Propagation Algorithm for Polar Code Decoding 


As presented in ca, similar to LDPC codes, polar codes 
can be decoded by applying the belief propagation (BP) 
algorithm over their factor graphs. Eor an (n, /c) polar code 
(n = 2"^), the factor graph is an m-stage network consisting 
of n.{m 0 1) nodes, where each node is associated with a 
right-to-left and a left-to-right likelihood message denoted by 
and (Rlj), respectively. Ljj denotes the right to left 
likelihood message of the node at the stage and the 
iteration. Pig.2 (a) shows an example of a 3-stage factor 
graph for n = 8 polar codes. Here each stage consists of 
n/2 = 4 processing elements (PEs). During the BP decoding 
procedure, these messages are propagated and updated among 
adjacent nodes using the min-sum updating rule, as shown by 
the following equations ll20ll : 
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a is a scaling parameter introduced in 1211 for the improve¬ 
ment of the decoding performance of a BP decoder. According 
to the decoding procedure of BP algorithm, PEs are activated 
stage-by-stage from left to right in each iteration. After the 
number of iterations reaches the speciflc maximum number 
(max_iter), node (i, m0l) will output the decoded information 
bit Ui based on the hard decision of the messages 


III. THE PROPOSED SCHEME 

Pig. 3(a) shows the scheduling tree of the successive can¬ 
cellation decoding (SCD) of the (8,4) polar code O, and Pig. 
3(b) depicts the equivalent BPD factor graph of the same 
(8,4) polar code. At each stage the SCD scheduling tree is 
split into a number of sub-trees, each of which is responsible 
for decoding a corresponding constituent code. The size of 
the sub-tree varies at each level and is reduced by half when 
moving from one stage to another stage. 
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Fig. 3. Correspondence between SCD scheduling tree and BPD factor graph 
(a) SCD Scheduling Tree (b) BPD Factor Graph (c) 2 CSFG’s at stage 1 (d) 
4 CSFG’s at stage 2 



SCD tree and BPD factor graph correspondence 


Fig. 4. SCD scheduling tree and factor graph of the (8,4) polar code 
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Fig. 5. Example illustrating the Proposed Scheme (a) Checking first CSFG 
at stage 1 (b) Checking second CSFG at stage 1 (c) Checking third CSFG at 
stage 2 


Before presenting the details of our proposed scheme, we 
first introduce the notion of the connected sub-factor-graph. A 
connected sub-factor graph (CSFG) is defined as a sub-factor- 
graph which has the same number of inputs and outputs and 
where the output nodes are at the stage m -b 1 and each input 
is connected to each output through some PEs in the sub¬ 
factor-graph. Fig. 3(b) shows two examples of CSFGs. It can 


be seen that each CSFG has a corresponding sub-tree in the 
scheduling tree of SCD. Fig. 3 (a) and (b) show examples 
of the corresponding sub-trees and the connected sub-factor- 
graph of the (8,4) polar code. The number of CSFGs at each 
stage is given by 2-^ , where j is the stage number. For the 
(8,4) polar code, as shown in Fig. 3 (c) and (d), the numbers 
of CSFGs at stages 1 and 2 are 2 and 4, respectively. 


















































































































































































































































































































































































































































































At each iteration t, the nodes at stage j in the BPD factor 
graph output left-to-right LLR-based propagating messages 
these are the inputs to the 2^ CSFGs at 
stage j. R\,2m-j are the inputs to the first CSFG, while 
^((/c-i) 2 ^-j+i)-(/c 2 ^-j) those for the CSFG . Each 

CSFG is responsible for the decoding of the corresponding 
constituent code from its respective input messages. 

The proposed scheme borrows the idea of successive can¬ 
cellation decoding (SCD), where the results of the previous- 
decoded bits are used for the decoding of the current bit. Here 
we introduce a CSFG freezing concept for a low complexity 
BPD. At a particular iteration t, when the message passing 
reaches a certain stage j, if a CSFG at that stage can correctly 
decode its corresponding constituent code (i.e. the CSFG has 
reached convergence), it is frozen and no message passing or 
updating within the CSFG will be needed in the subsequent 
iterations. The details of how to check whether a CSFG can 
be frozen will be presented later. 

One important thing is the checking order for the freezing 
of the CSFG. A CSFG can only be frozen if all the previous 
CSFGs (in the order of the decoding bits) at that stage 
have been frozen. If a CSFG is not frozen, it means the 
message values inside it will still be changed in the subsequent 
iterations. Similar to the SCD operation, the message values 
of this CSFG will be used for the decoding of the constituent 
codes of the subsequent CSFGs. Therefore the freezing of the 
CSFGs at a stage has to follow an order based on the decoded 
bit. When a CSFG at a certain stage is checked for freezing, if 
it cannot correctly decode its constituent code, then it cannot 
be frozen and the message passing and updating have to be 
executed for PEs at that stage. After that, we move to the next 
stage and check the convergence of the corresponding CSFGs. 
When we move to the next stage, the number of CSFGs will 
be doubled. This freezing-checking procedure will continue 
from stage to stage until the end of the BPD factor graph is 
reached. 

Next we will present how we can freeze a CSFG. 
As discussed above, a CSFG corresponds to a sub-tree 
in the SCD scheduling tree, which can also be viewed 
as a constituent code of the original polar code. At 
the iteration and stage j, the left-to-right propagation 
messages connected to the 

CSFG can be viewed as the LLR inputs to decode the 
corresponding constituent code. We can apply Maximum- 
Likelihood Decoding (MLD) on this constituent code with 
^((/c-i) 2 ^-i+i) (/c 2 ^-j) input to obtain a decoded output 
vector (r^((/c-i) 2 ^-j+i):(/e 2 "^-i ))5 which is a sub-vector of the 
source word (u^) of the original polar code. As will be shown 
later, if the freezing of the CSFGs follows the proposed order, 
the input messages of CSFG R\^k-i)2^-o +iy.{k2^-j),j+i 
reliable enough and MLD (r^((/c-i) 2 ^-j+i):(/c 2 "^-j ))5 based on 
these input messages, can be taken as the decoded result of 
the constituent code. The freezing order of the CSFG has to 
follow the decoded bit order, and the top CSFGs at each stage 
will be frozen first. 


Fig. 4 shows the SCD scheduling tree and the factor graph 
of the (8,4) polar code. We can see that the top CSFGs are 
actually corresponding to the first few sub-trees that follow the 
depth-first traversal of the SCD scheduling tree. At the first 
iteration, the input messages to these CSFGs are the same as 
the input LLR messages of the corresponding SCD sub-trees. 
Hence if we can decode the input messages of these CSFGs 
using MLD, the decoding performance on the corresponding 
constituent code will achieve or even exceed that of SCD. 
If the CSFGs cannot be frozen at this iteration, and need 
further iteration to converge, due to the nature of the iterative 
decoding, the reliability of the input messages to these CSFGs 
will become better and hence the input LLR messages of these 
CSFGs will be more reliable than the input messages to the 
SCD sub-tree. As a result the MLD performance will not be 
worse than that of SCD. 

MLD is based on an exhaustive search and hence it has a 
huge complexity. To reduce the complexity, novel checking 
criterion is suggested to efficiently find the MLD result of 
the constituent code. Let R\,^rn-j be the left-to-right 
propagation messages of a CSFG at stage j. We obtain a hard 
decision vector X2m-j = [xi.. .X 2 m-j] for these messages 
where 

= ( ° > 0 

I 1 if <0 

Given X2m-i as input to the CSFG, the decoded bit vector 
at its output U2m-i, which is also a sub-vector of the source 
word of the original polar code u^, is obtained by the inverse 
operation of polar code encoding that is given as 

(3) 

where 

Fig. 5(a) shows an example of hard decision decoding. The 
CSFG can be frozen if the sub-source-word vector U2m-i 
satisfies the following frozen set criteria: 

Uk = 0, for k e (4) 

The following lemma shows that if the frozen set criteria 
(4) are satisfied, the sub-source-word vector U 2 m-j obtained 
by (3) is indeed the decoding results of the MLD on the 
corresponding constituent code of the CSFG. 

Lemma 1 . Let Ri. 2 m-j j_^iand X2m-i be the input LLR 
messages and hard decision vector based on (2) for the cor¬ 
responding CSFG at the stage. If U 2 m-j is obtained from 
X2m-j based on (3) and it satisfies the frozen-set criteria of (4), 
then U 2 m-j is the maximum likelihood detection (MLD) result 
of the corresponding constituent code with input messages 
Ri: 2^-3 ,j + l- 

Proof: The CSFG at the stage represents a short 
polar (constituent) code of length Its input and output 

are related by X 2 m-i = U 2 m-j From ||6l and ifTOl . 
given the input LLR the likelihood value of 

an arbitrary source word U 2 m-j is given by “ 

2X2rr^-j[i])Rl.,2rr^-i J + WherC X2rr.-j = U2rri-jF® 

If no source word bit is a frozen bit, i.e., Ui can assume both 
0 and 1 for 1 < i < , the source word U 2 m-j obtained 
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Fig. 6. Comparison results for a (1024,512) polar code (a) Error correction performance (b) Average number of iterations (c) Average required computations 
(d) Computations savings over G-Matrix based early stopping 


from X 2 m-j has a maximum likelihood value which is equal to 
^i=i \Ri: 2^-3 [i] I. If a certain source word bit is a frozen 

bit, the searching space of the valid source word is smaller 
and Ei=i 1 ^ 1 : 2 — 3,j-\-i[i\\ iiidy not be achieved. However, 
if U 2 m-j satisfies (4), this likelihood value is achievable and 
the source word U 2 m-j is a valid source word. Hence, U 2 m-j 
is the MLD result. ■ 


When a CSFG at stage j is frozen, the corresponding 
computations and message updating are not needed for the 
rest of the iterations. We can also fix its right-to-left feedback 
propagating messages for the rest of the itera¬ 

tions based on its X 2 m-i since the output decoding decision 
for this CSFG has already been made and we have 


T tE{t . .msLX_iteY} 


( —OQ 


In one iteration, propagating messages from left to right, for 
any CSFG, if the frozen set criteria (4) is not satisfied then 


we cannot freeze this CSFG. We then update the messages 
at this stage using equation ( 1 ), move to the next stage and 
repeat the same procedure. Fig. 5(b) shows an example. At the 
second iteration, we check the bottom CSFG at stage 1. U 4 
does not satisfy the frozen bit criteria (4) and we cannot freeze 
this CSFG. So the messages are updated at stage 1 and we 
move to the next stage (stage 2 ) to check whether the first un¬ 
frozen CSFG can be frozen at stage 2, as shown in Fig. 5(c). A 
CSFG can only be considered for freezing if all the preceding 
CSFGs at the same stage have been frozen. This procedure is 
repeated until all the CSFGs at a stage are frozen or we reach 
the maximum number of iterations, which corresponds to the 
completion of the decoding process. 

With the freezing of CSFGs, computations and message 
updating operations do not need to be executed for rest of 
the iterations. Therefore the overall computation complexity, 
and hence the energy consumption, are reduced. Moreover the 





















































right-to-left feedback propagating messages are 

fixed to either -oo or -Foo depending on the value of the hard- 
decision bit when a CSFG is frozen. This boosts the reliability 
of the feedback messages and will help the rest of the unfrozen 
CSFGs to converge faster in the subsequent iterations, thus 
helping to reduce the overall number of iterations for the 
decoding and hence the average latency. 

IV. SIMULATION RESULTS 

To verify the error correcting performance and complexity 
saving for the proposed frozen-CSFG-based BPD scheme, 
we carry out a simulation on a polar code of length 1024 
and rate Vi and compare the result with the original BPD 
scheme in csi (which we denote as the baseline BPD) and 
the BPD using a G-matrix-based stopping criterion in ll2Qll . 
Fig. 6 shows the simulation results over an AWGN channel 
with BPSK modulation. For a fair comparison, we use the 
same set of parameters as 1201 . where min-sum approximation 
with scaling parameter {a = 0.9375) and max_iter = 40 
were used. As seen in Fig. 6(a), the proposed method has no 
performance degradation compared with the other two existing 
BPD schemes. The average number of iterations required for 
decoding a code word are compared in Fig. 6(b). It can be 
seen that the proposed method requires the least number of 
iterations, resulting in lower latency and higher throughput 
compared to 1^ . At SNR = 3dB, the average number of 
iterations is reduced by 46% and 17% when compared to 
the baseline BPD and the G-matrix-based early stop method, 
respectively. 

We also compare the overall computation complexity of 
the three BPD schemes. For each PE in the factor graph, we 
count the number of iterations until its operation is frozen in 
the proposed scheme. We then sum the number of iterations, 
for which that PE is active, for all the PEs. Eor the other 
two schemes, since every PE needs to be executed in every 
iteration, the computation complexity just depends on the 
average number of iterations. 

Eig. 6(c) shows the normalized average number of computa¬ 
tions required for all three schemes. It can be observed that the 
proposed scheme requires the least number of computations, 
which translates directly to lower power consumption and 
latency for the overall decoding process. It can be seen that at 
SNR = 3dB, the average computation complexity is reduced 
by 65% and 46% when compared with the baseline scheme 
and the early-stopping scheme, respectively. As state of the 
are BPD, computaion savings for the proposed method are 
comaperd with G-matrix based early stopping method in Pig. 
6(d). 

V. CONCLUSION 

In this work we have presented a novel scheme to reduce the 
average number of computations as well as average latency in 
belief propagation decoding (BPD) for polar codes based on 
the concept of a frozen connected sub-factor-graph. Simulation 
results show that there is no performance degradation of the 
proposed scheme when compared with the original belief 


propagation algorithm and the G-matrix-based early stopping 
criterion, while the scheme enjoys a 46 65 % reduction in 

computation complexity, and 17 - 46% reduction in latency 
at SNR = 3dB. In future work, the VLSI architecture and a 
hardware implementation will be developed. 
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